22 research outputs found
The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models
Visual storytelling is an intriguing and complex task that only recently
entered the research arena. In this work, we survey relevant work to date, and
conduct a thorough error analysis of three very recent approaches to visual
storytelling. We categorize and provide examples of common types of errors, and
identify key shortcomings in current work. Finally, we make recommendations for
addressing these limitations in the future.Comment: Accepted to the NAACL 2019 Workshop on Shortcomings in Vision and
Language (SiVL
Latent Neural Differential Equations for Video Generation
Generative Adversarial Networks have recently shown promise for video
generation, building off of the success of image generation while also
addressing a new challenge: time. Although time was analyzed in some early
work, the literature has not adequately grown with temporal modeling
developments. We propose studying the effects of Neural Differential Equations
to model the temporal dynamics of video generation. The paradigm of Neural
Differential Equations presents many theoretical strengths including the first
continuous representation of time within video generation. In order to address
the effects of Neural Differential Equations, we will investigate how changes
in temporal models affect generated video quality
Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective
Reproducibility is a key aspect for scientific advancement across
disciplines, and reducing barriers for open science is a focus area for the
theme of Interspeech 2023. Availability of source code is one of the indicators
that facilitates reproducibility. However, less is known about the rates of
reproducibility at Interspeech conferences in comparison to other conferences
in the field. In order to fill this gap, we have surveyed 27,717 papers at
seven conferences across speech and language processing disciplines. We find
that despite having a close number of accepted papers to the other conferences,
Interspeech has up to 40% less source code availability. In addition to
reporting the difficulties we have encountered during our research, we also
provide recommendations and possible directions to increase reproducibility for
further studies
Exploring variation of results from different experimental conditions
It might reasonably be expected that running
multiple experiments for the same task using
the same data and model would yield very
similar results. Recent research has, however,
shown this not to be the case for many NLP
experiments. In this paper, we report extensive
coordinated work by two NLP groups to run
the training and testing pipeline for three neural
text simplification models under varying experimental conditions, including different random
seeds, run-time environments, and dependency
versions, yielding a large number of results for
each of the three models using the same data
and train/dev/test set splits. From one perspective, these results can be interpreted as shedding
light on the reproducibility of evaluation results
for the three NTS models, and we present an in-depth analysis of the variation observed for different combinations of experimental conditions.
From another perspective, the results raise the
question of whether the averaged score should
be considered the ‘true’ result for each model
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP
Reading With Robots: Towards a Human-Robot Book Discussion System for Elderly Adults
As people age, it is critical that they maintain not only their physical health, but also their cognitive health―for instance, by engaging in cognitive exercise. Recent advancements in AI have uncovered novel ways through which to facilitate such exercise. In this thesis, I propose the first human-robot dialogue system designed specifically to promote cognitive exercise in elderly adults, through discussions about interesting metaphors in books. I describe my work to date, including the development of a new, large corpus and an approach for automatically scoring metaphor novelty. Finally, I outline my plans for incorporating this work into the proposed system